Entity Extraction from the Web with WebKnox
نویسندگان
چکیده
This paper describes a system for entity extraction from the web. The system uses three different extraction techniques which are tightly coupled with mechanisms for retrieving entity rich web pages. The main contributions of this paper are a new entity retrieval approach, a comparison of different extraction techniques and a more precise entity extraction algorithm. The presented approach allows to extract domain-independent information from the web requiring only minimal human effort.
منابع مشابه
Presenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملAutomatic Construction of a Semantic, Domain-Independent Knowledge Base
In this paper, we want to show which difficulties arise when automatically constructing a domain-independent knowledge base from the web. We show possible applications for such a knowledge base to emphasize its importance. Current knowledge bases often use manuallybuilt patterns for extraction and quality assurance which does not scale well. Our contribution to the community will be a technique...
متن کاملWebKnox: Web Knowledge Extraction
The paper describes and evaluates a system for extracting knowledge from the web that uses a domain independent fact extraction approach and a self supervised learning algorithm. Using a trust algorithm, the precision of the system is improved to over 70% compared with a baseline of 52%.
متن کاملQuality Impact of Value Matching and Scoring in Top-k Entity Attribute Extraction∗
The entity attribute extraction problem, or how to extract entities and their attribute values from natural language Web documents, is of critical importance for Web search and information access in general. Unfortunately, because of the noisy nature of theWeb and its scale, entity attribute extraction is notoriously challenging in terms of both extraction efficiency and quality. In our earlier...
متن کاملتشخیص اسامی اشخاص با استفاده از تزریق کلمههای نامزد اسم در میدانهای تصادفی شرطی برای زبان عربی
Named Entity Recognition and Extraction are very important tasks for discovering proper names including persons, locations, date, and time, inside electronic textual resources. Accurate named entity recognition system is an essential utility to resolve fundamental problems in question answering systems, summary extraction, information retrieval and extraction, machine translation, video interpr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009